Buffer Overflow Detection using Environment Refinement

نویسندگان

  • Franjo Ivančić
  • Sriram Sankaranarayanan
  • Aarti Gupta
  • Ilya Shlyakhter
چکیده

Interpreter. Abstract interpretation [Cousot and Cousot 1977] is used in our flow as the main proof engine. While the focus of our effort is on finding bugs, proofs are valuable since they indicate the absence of a bug w.r.t our modeling assumptions. Furthermore, proofs of properties enable semantic slicing and simplication of the program, futher reducing its size and improving scalability. Our framework performs abstract interpretation over various numerical domains. Such domains track relations between integer variables in the program at each program location. An assertion violation ASSERT(φ) is proved by our abstract interpreter if φ can be established as an invariant. Our abstract interpreter is inter-procedural, flow and context sensitive. It is built in a domain-independent and extensible fashion, allowing for various abstract domains. Currently, we have implementations of abstract domains such as constants, intervals [Cousot and Cousot 1976], octagons [Miné 2001b], symbolic ranges [Sankaranarayanan et al. 2007] and polyhedra [Cousot and Halbwachs 1978]. These domains are organized in increasing order of complexity. To enhance the number of properties proved, we have incorporated a pathsensitive analysis technique based on transforming the CFG using the concept of elaborations [Sankaranarayanan et al. 2006], and by disjunctive polyhedral analysis, using a modified widening-upto operator [Wang et al. 2007]. While path-sensitive analyses can be quite expensive, we employ a progression of analysis techniques, each more powerful than the preceeding analysis. Thus, heavyweight analysis techniques operate on programs that are simplified using the results of the preceeding analysis. The invariants computed at each program point are used to infer safe intervals for the program variables. The intervals thus computed restrict the domain of these variables and are utilized by the model checker. For instance, if the range for a variable x is [0, 100], we may use 7 bits to represent x, as opposed to 32 bits that would have been required in the absence of such information. This reduces the number of variables in the SAT problem, typically a 50− 80% reduction, and speeds up the model-checker considerably. Furthermore, the invariants are directly converted to SAT formulas and used to constrain the model checking search. These optimizations have lead to a large reduction in the size of the models and the time taken to check them [Ganai and Gupta 2006]. Modeling vs. Analysis. Our choice of abstract domains for the abstract interpreter has also influenced that of the memory model. To begin with, the memory model avoids performing non linear arithmetic such as multiplication of variables with each other. Therefore, linear abstract domains work well on the model. Secondly, the memory layout chosen avoids the use of strides in pointer arithmetic (except for handling typecasting). This ensures that pointer updates such as p := q+ i are directly interpreted in our model without having to scale i by sizeof(∗p). FurtherDraft, December 2008. Buffer Overflow Detection · 15 more, the unorthodox memory layout transformation depicted in Figure 4, avoids the need for complex arithmetic upon field accesses on (arrays of) structures. Instead, such accesses directly translate into array accesses on the flattened structure. Such a modeling allows us to use domains such as octagons and intervals that do not model complex arithmetic involving non-unit coefficients. Self Limitation Analysis. Our memory model tracks the sizes of arrays and pointers in a partially field sensitive manner. However, it does not track all values/attributes of array elements, or field accesses beyond the cutoff depth. These are modelled as non deterministic values. In practice, their presence can lead to many false counterexamples that cannot be remedied by the environment refinement techniques presented later in Section 4. Therefore, we include a conservative dataflow analysis that identifies assertion checks which must (data) depend on these non-deterministic values. The removal of such checks reduces the number of false positives considerably. Note that self limitation analysis does not prune properties that depend on a non-deterministic environment. It primarily removes properties that depend on array contents or recursive structures. Such properties cannot be resolved by our memory modelling, or proved using user-defined annotations. Model Checker. The model checker creates a finite state machine representation of the program to find bugs or proofs for the remaining properties. Each integer variable is treated as a 32 bit entity, character variables as 8 bits and so on. However, the interval information provided by the abstract interpreter is used to reduce the number of bits significantly. As a result, we can now construct a symbolic model of the program, by bit-blasting the variables, the dataflow operations and the control flow structure using the range information. Note that by using bit-accurate representations of all (finite bitwidth) operators, we obtain a bit-accurate symbolic model for the program. We use bit-accurate representations of all operators, ensuring that arithmetic overflows are modeled faithfully. The model checker verifies the symbolic model for the reachability of the embedded assertion violation checks. We primarily use SAT-based bounded model checking (BMC) [Biere et al. 1999; Ganai and Gupta 2006]. This technique unrolls the program upto some depth d > 0 and searches for the presence of a bug at that depth by compilation into a SAT problem. The depth d is increased iteratively until a bug is found or resources run out. We have also implemented verification using BDDs [J.R. Burch et al. 1994] and mixed symbolic representations [Yang et al. 2006] to find proofs and violations for properties. Significantly, all the model checkers generate a counterexample trace that displays violations concretely. Such a trace vastly simplifies the user inspection and evaluation of the error. Witness Slicing & Diagnosis. We implement a path-based slicing algorithm on the witness traces [Jhala and Majumdar 2005]. The algorithm prunes irrelevant statements from the reported witnesses and aids user comprehension. We use a weakest precondition on the witness trace starting from the assertion violation to aid in its diagnosis. Our tool provides interfaces to the Eclipse(tm) front-end and can demonstrate the trace through a debugger gdb to visualize the trace (see fig. 10). Draft, December 2008.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Buffer Overflow Benchmark for Software Model Checkers (Short Paper)

Software model checking based on abstraction-refinement has recently achieved widespread success in verifying API conformance in device drivers, and we believe this success can be replicated for the problem of buffer overflow detection. This paper presents a publicly-available benchmark suite to help guide and evaluate this research. The benchmark consists of 298 code fragments of varying compl...

متن کامل

Model - Checking : Benchmarking and Techniques for Buffer Overflow Analysis by Kelvin Ku A thesis submitted in conformity with the requirements

Software Model-Checking: Benchmarking and Techniques for Buffer Overflow Analysis Kelvin Ku Master of Science Graduate Department of Computer Science University of Toronto 2008 Software model-checking based on abstraction-refinement has recently achieved widespread success in verifying critical properties of real-world device drivers. We believe this success can be replicated for the problem of...

متن کامل

Real-World Buffer Overflow Protection for Userspace and Kernelspace

Despite having been around for more than 25 years, buffer overflow attacks are still a major security threat for deployed software. Existing techniques for buffer overflow detection provide partial protection at best as they detect limited cases, suffer from many false positives, require source code access, or introduce large performance overheads. Moreover, none of these techniques are easily ...

متن کامل

Automated Generation of Buffer Overflow Quick Fixes Using Symbolic Execution and SMT

In many C programs, debugging requires significant effort and can consume a lot of time. Even if the bug’s cause is known, detecting a bug in such programs and generating a bug fix patch manually is a tedious task. In this paper, we present a novel approach used to generate bug fixes for buffer overflow automatically using static execution, code patch patterns, quick fix locations, user input s...

متن کامل

Precise Buffer Overflow Detection via Model Checking

Buffer overflows are the source of a vast majority of vulnerabilities in today’s software. Existing solution for detecting buffer overflow, either statically or dynamically, have serious drawbacks that hinder their wider adoption by practitioners. In this paper we present an automated overflow detection technique based on model checking and iterative refinement. We discuss advantages, and limit...

متن کامل

Network-Based Buffer Overflow Detection by Exploit Code Analysis

Buffer overflow attacks continue to be a major security problem and detecting attacks of this nature is therefore crucial to network security. Signature based network based intrusion detection systems (NIDS) compare network traffic to signatures modelling suspicious or attack traffic to detect network attacks. Since detection is based on pattern matching, a signature modelling the attack must e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008